Feature Selection for Ordinal Text Classification1

نویسندگان

  • Stefano Baccianella
  • Andrea Esuli
  • Fabrizio Sebastiani
چکیده

Ordinal classification (also known as ordinal regression) is a supervised learning task that consists of estimating the rating of a data item on a fixed, discrete rating scale. This problem is receiving increased attention from the sentiment analysis / opinion mining 1This is a revised and substantially extended version of a paper appeared as (Baccianella et al., 2010). The order in which the authors are listed is purely alphabetical; each author has given an equally important contribution to this work. community, due to the importance of automatically rating large amounts of product review data in digital form. As in other supervised learning tasks such as binary or multiclass classification, feature selection is often needed in order to improve efficiency and to avoid overfitting. However, while feature selection has been extensively studied for other classification tasks, it has not for ordinal classification. In this paper we present six novel feature selection methods that we have specifically devised for ordinal classification, and test them on two datasets of product review data against three methods previously known from the literature, using two learning algorithms from the “support vector regression” tradition. The experimental results show that all six proposed metrics largely outperform all three baseline techniques (and are more stable than them by an order of magnitude), on both datasets and for both learning algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Gain Feature Selection for Ordinal Text Classification using Probability Re-distribution

This paper looks at feature selection for ordinal text classification. Typical applications are sentiment and opinion classification, where classes have relationships based on an ordinal scale. We show that standard feature selection using Information Gain (IG) fails to identify discriminatory features, particularly when they are distributed over multiple ordinal classes. This is because inter-...

متن کامل

Selecting Features for Ordinal Text Classification

We present four new feature selection methods for ordinal regression and test them against four different baselines on two large datasets of product reviews.

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

Using Micro-Documents for Feature Selection: The Case of Ordinal Text Classification

Most popular feature selection methods for text classification (TC) are based on binary information concerning the presence/absence of the feature in each training document. As such, these methods do not exploit term frequency information. In order to overcome this drawback we break down each training document of length k into k training “microdocuments”, each consisting of a single word occurr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013